NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sustaining Human Agency, Attending to Its Cost: An Investigation into Generative AI Design for Non-Native Speakers' Language Use

https://doi.org/10.1145/3706598.3713626

Xiao, Yimin; Hancock, Cartor; Agrawal, Sweta; Mehandru, Nikita; Salehi, Niloufar; Carpuat, Marine; Gao, Ge (April 2025, ACM)

Free, publicly-accessible full text available April 25, 2026
Sustaining Human Agency, Attending to Its Cost: An Investigation into Generative AI Design for Non-Native Speakers' Language Use

Xiao, Yimin; Hancock, Cartor; Agrawal, Sweta; Mehandru, Nikita; Salehi, Niloufar; Carpuat, Marine; Gao, Ge (April 2025, arXiv)

Free, publicly-accessible full text available April 20, 2026
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading Comprehension

https://doi.org/10.1162/tacl_a_00653

Agrawal, Sweta; Carpuat, Marine (January 2024, Transactions of the Association for Computational Linguistics)

Abstract Automatic text simplification (TS) aims to automate the process of rewriting text to make it easier for people to read. A pre-requisite for TS to be useful is that it should convey information that is consistent with the meaning of the original text. However, current TS evaluation protocols assess system outputs for simplicity and meaning preservation without regard for the document context in which output sentences occur and for how people understand them. In this work, we introduce a human evaluation framework to assess whether simplified texts preserve meaning using reading comprehension questions. With this framework, we conduct a thorough human evaluation of texts by humans and by nine automatic systems. Supervised systems that leverage pre-training knowledge achieve the highest scores on the reading comprehension tasks among the automatic controllable TS systems. However, even the best-performing supervised system struggles with at least 14% of the questions, marking them as “unanswerable” based on simplified content. We further investigate how existing TS evaluation metrics and automatic question-answering systems approximate the human judgments we obtained.
more » « less
Full Text Available
Controlling Pre-trained Language Models for Grade-Specific Text Simplification

https://doi.org/10.18653/v1/2023.emnlp-main.790

Agrawal, Sweta; Carpuat, Marine (January 2023, Association for Computational Linguistics)

Full Text Available
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

https://doi.org/10.1162/tacl_a_00563

Xu, Weijia; Agrawal, Sweta; Briakou, Eleftheria; Martindale, Marianna J; Carpuat, Marine (January 2023, Transactions of the Association for Computational Linguistics)

Neural sequence generation models are known to “hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.
more » « less
Full Text Available
Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors

https://doi.org/10.18653/v1/2023.emnlp-main.712

Mehandru, Nikita; Agrawal, Sweta; Xiao, Yimin; Gao, Ge; Khoong, Elaine; Carpuat, Marine; Salehi, Niloufar (January 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)

A major challenge in the practical use of Machine Translation (MT) is that users lack information on translation quality to make informed decisions about how to rely on outputs. Progress in quality estimation research provides techniques to automatically assess MT quality, but these techniques have primarily been evaluated in vitro by comparison against human judgments outside of a specific context of use. This paper evaluates quality estimation feedback in vivo with a human study in realistic high-stakes medical settings. Using Emergency Department discharge instructions, we study how interventions based on quality estimation versus backtranslation assist physicians in deciding whether to show MT outputs to a patient. We find that quality estimation improves appropriate reliance on MT, but backtranslation helps physicians detect more clinically harmful errors that QE alone often misses.
more » « less

Search for: All records